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We study the high frequency price dynamics of traded stocks by a model of returns using a semi- 
Markov approach. More precisely we assume that the intraday return are described by a discrete 
time homogeneous semi-Markov process and the overnight returns are modeled by a Markov chain. 
Based on this assumptions we derived the equations for the first passage time distribution and the 
volatility autocorreletion function. Theoretical results have been compared with empirical findings 
from real data. In particular we analyzed high frequency data from the Italian stock market from 
first of January 2007 until end of December 2010. The semi-Markov hypothesis is also tested through 
a nonparametric test of hypothesis. 



I. INTRODUCTION 

Semi-Markov processes (SMP) are a wide class of 
stochastic processes which generalize at the same time 
both Markov chains and renewal processes. Their main 
advantage is that of using whatever type of waiting time 
distribution for modeling the time to have a transition 
from one state to another one. This major flexibility has 
a price to pay: availability of data to estimate the pa- 
rameters of the model which are more numerous. Semi- 
Markov processes generalizes also non-Markovian models 
based on continuous time random walks used extensively 
in the econophysics community, see for example [TJ [SJ. 
SMP have been used to analyze financial data and to de- 
scribe different problems ranging from credit rating data 
modeling [2j to the pricing of options [U [5] . 

With the financial industry becoming fully computer- 
ized, the amount of recorded data, from daily close all 
the way down to tick-by-tick level, has exploded. Nowa- 
days, such tick-by-tick high-frequency data are readily 
available for practitioners and researchers alike [SI [7] • It 
seemed then natural to us trying to verify the semi- 
Markov hypothesis of returns on high-frequency data. 

We propose a semi-Markov model for price return. 
More precisely we assume that the intraday returns (up 
to one minute frequency) are described by a discrete time 
homogeneous semi-Markov process and the overnight re- 
turns are modeled by a Markov chain. In this way we can 
consider differently the intraday and the overnight activ- 
ities. To establish the validity of our model we tested it 
first of all by using a nonparametric test proposed by [5] 
and then against two of the stylized facts which charac- 
terized financial data: the first passage time distribution 
O [TO] and the autocorrelation function of the square of 
returns. 

Following the model we determine equations for the 
first passage distributions and the intraday autocorrela- 
tion function by using renewal type arguments. Results 
from the model arc then compared with empirical results 
obtained from the data. We show that these stylized 



facts are better reproduce when the semi-Markov model 
is used compare to a simple Markov chain. 

The database used for the analysis is made of high 
frequency tick-by-tick price data from all the stock in 
Italian stock market from first of January 2007 until end 
of December 2010. From prices we then define returns at 
one minute frequency. 

The paper is divided as follows: First, semi-Markov 
processes, notation and some results are described in Sec- 
tion 2. Next, the price model is illustrated and first pas- 
sage time distributions and the intraday autocorrelation 
functions are computed in Section 3. Finally, in Section 
4, an application to real high frequency data illustrates 
the results. 



II. SEMI-MARKOV PROCESSES 

We define an HSMP with values in a finite state space 
E = {1,2,..., to}, see for example [TT|[T2]. Let (0,F,P) 
be a probability space; we consider two sequences of ran- 
dom variables: 

J n : il -> E ; T n : Q -> IN 

denoting, respectively, the state and the time of the n-th 
transition of the system. 

We assume that ( J n , T n ) is a Markov Renewal Process 
on the state space ExI with kernel Qij (t), i, j <E E,t <E 
IN. 

The kernel has the following probabilistic interpreta- 
tion: 

P[J n +i = j, T n+1 - T n < t\a(J h , T h ), h<t, J n =i] = 
P[J n +i = j, T n+ i -T n < t\J n = i] = Qij(t) 

(II.1) 

and it results = lim Qy(t); i,j £ E, t £ IN where 

t — > oo 

P = (pij) is the transition probability matrix of the em- 
bedded Markov chain J n . 

Furthermore, it is useful to introduce the probability 
to have next transition in state j at time t given the 
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starting at time zero from state i 

bij(t) = P[J n +i = j,T n+ i - T n = t\J n 
Qij(t)-Qij(t-1) if t > 



if t = 

We define the distribution functions 



(II.2) 



Hi(t) = P[T n+1 -T n < t\J n =i] = Y, Qv(t) ( IL3 ) 

representing the survival function in state i. 

The Radon-Nikodym theorem assures for the existence 
of a function Gij (t) such that 



Gij(t) = P{T n+ i -T n < t\J n = i, J n+1 = j} 



1 ifftj = 



(II.4) 



It denotes the waiting time distribution function in 
state i given that, with next transition, the process will 
be in the state j. The sojourn time distribution GV,-(-) 
can be any distribution function. We recover the discrete 
time Markov chain when the Gij(-) are all geometrically 
distributed. 

It is possible to define the HSMP Z(t) as 



Z{t) = J N[t) , Vie IN 



(II.5) 



where N(t) = sup{n e IN : T n < t}. Then Z(t) rep- 
resents the state of the system for each waiting time. 
We denote the transition probabilities of the HSMP by 
= P[Z{t) =j\Z(0)=i\. They satisfy the following 
evolution equation: 

t 

<t> ij (t)=S ij (l-H i (t)) + Y,J2 b *( r ) ( t>ki( t - r )- ( IL6 ) 

keE t=1 



To solve equation [II. 6) there are well known algo- 



rithms in the SMP literature [T2IIT5] . 

At this point we introduce the discrete backward re- 
currence time process linked to the SMP. For each time 
t € IN we define the following stochastic process: 



B(t) =t-T 



N(t) 



(11.7) 



We call it discrete backward recurrence time process. 

If the semi-Markov process Z(t) indicates the state of 
the system at time t, B(t) indicates the time since the 
last jump. 

In Figure 1 we show a trajectory of an HSMP. At time 
t the process Z(t) is in the state Jh-i and the last tran- 
sition occurred at time T^-i then at time t the backward 
process holds B(t) = t — T^-i- 

The joint stochastic process (Z(t), B(t),t 6 IN) with val- 
ues in E x IN is a Markov process, see for example [TT] , 
That is: 

P[Z(T) =j, B(T)<v'\a(Z(h),B(h)),h<t, Z(t)=i, B(t) = 
= P[Z(T) = j, B(T) < v'\Z(t) = i, B(t) = v]. 



T = T 



FIG. 1. Trajectory of a HSMP with backward times 

To safe space let denote the event {Z(Q) = i,B(0) = v} 
in a more compact form by (i,v). 

In the sequel of the paper we will make use of the 
following probabilities: 



b rt 3 (v;v',t) = P[Z(t)=j,B(t)=v'\(i,v)}; 



(ii. 



Our next step is to compute cfqJv; v' , t) as a function 
of the semi-Markov kernel. The results here below have 
been proved in [3] and further developed in [14] . 

For all states i,j 6 E and times h,v,t € IN such that 
Hi(v) < 1 we have that 

6ij[l - Hj(t + v)] , 

4>ij{V,V ,t) = Hv>=t+v}+ 

\- b ik (s + v) b b , , 
keE g=l L n n 

(II.9) 

Notice that b 0^ (O; v', t) satisfy the following system of 
equations 

t 

0^(0; v', t)=%[l-ff i (t+«)^^6i ft (s+ W ) b 4> b kj (0; v', t-s). 

keEs=i 

This system can be solved with algorithms similar to 
that used for equation (i7.6). 



It results that b d} ij (v;t) = £*t= Q h 4>\Av\v', t). Conse- 



quently: 



b (f>ij(v;t) = Sij 



[l-Hj(t + v)] 



keE s=l [ 



(11.10) 



Hi{v)] 



III. THE PRICE MODEL 



: v ] Let us assume that the value of the asset under study 
is described by the time varying asset price S(t) . The 



3 



time variable t £ {0, 1, . . . , nd} where n is the number of 
unit periods during the day (i.e. minutes) and d is the 
number of days. 

The intraday return at time t calculated over a time 
interval of length 1, is defined as 



Z(t) 



S(t + 1) - S(t) 
S(t) 



(III.ll) 



while if t = nk, k = 1,2, ...,d we define the overnight 
return as 



X(t) = 



S(t + 1)-S{t) 
S(t) 



We assume that Z(t) is a discrete time HSMP with 
finite state space 



E = {- 



A.--,-2A,-A,0,A,2A, 



C A} 



and kernel b = (6^(7)), Vi, j G E and 7 G IN. On the 
contrary we describe X(t) as a discrete time homoge- 
neous Markov chain with the same state space and tran- 
sition probability matrix T — (tij)ij^E- 

We made this choice to take into consideration two 
different types of market activity: one (intraday) when 
the market is open and the second one (overnight) when, 
even if the market is closed, the opening price reflects 
the information accumulated during the stop of the activ- 
ity. We define a simplified expression taking into account 
both the intraday and overnight returns: 



W(t) = 



Z(t) if (k - l)n < t < nk 
X(t) if t = nk 



(111.13) 



the relation M t (r) > p is fulfilled for the first time. We 
will denote the fpt as X p (t). Then 



X p (t) =min{r > 0;M t (r) > p}. 



We assume that the semi-Markov process Z{t) is time 
homogeneous then we can simply denote the fpt X p (t) = 
A p . We are interested in finding the distributional prop- 



(III 12) erties of the fpt. For each time t, let 



Ri(v,t;p)=P(X p >t\(i,v)) 



where i £ E denotes the state of the return and v £ IN 
the time length of being in this state both at time zero. 

Let us define by Rij(v, t; w, p), Vu> € SP t , Vp € K, the 
probability 



P(X P > t, W(t) = j, M (t + 1) = w\(i, «)), 



obviously 



where k = 1, 2 d. 



Ri{v,t;p) = J2 Ri,j{v,t;x,p). (111.16) 

j£E xeSP t ,x<p 



A. The first passage time distributions 

The accumulation factor from t to t + r is given by 

r-l 

M t (r) = [] (l + W(t + k)). (111.14) 



k=0 

and takes value in the set 

T-l 

SP T = {x e IR : x = Y[ (1 + i(t + k j) , i(t + k) £ E}. 

k=0 

More in general we need to introduce the symbol 
SPP = SP T P| (—00, p) to denote the set of accumula- 
tion factor values being less than p at time r. 

It is easy to verify the relation between M t (r) and the 
price S(t) 



M t (r) 



S(t + t) 

s(t) ■ 



(111.15) 



The fpt for an investment made at time t at price S(t), 
is defined as the time interval r = t' — t, t' > t, where 



Here below we derive the equation for finding the fpt 
distribution in the proposed model. In the following we 
should distinguish different cases. The first case if for 
time 1 < t < n — 1. This means that we are interested 
in determining Rij(v, T; w, p) for time t belonging to the 
first day. 

Being the events {T™ — k} disjoint, it follows that 



Ri,j(v,t; w, p) 

= P{X P > t, W{t) = j, M {t + 1) = w, T™ > t\{i, v)) 
+ P(X p >t,W{t)=j,M (t+l)=w,T?<t\(i,v)). 

(III. 1 7) 

First addend on the right hand side (r.h.s.) of (III. 17) is 
equal to: 



P(Vu £ (0,f + 1], M (u) < P \W(t) = j,T? > t, (*,«)) 
• P(W(t) = j\JT > t, (i, v)) ■ P(I? > t\(i, «)) 
'1-Hdt + vY 



- 1 {(l+iA)*<p}^ij 



1 - Hi(v) 
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Second addend on the r.h.s. of (III. 17 1 is equal to: 



(III.18I 



E J2 p( y u£ (o,t+i],M («)<p > 

M (t + l) —w, W(t) =j, T{" — m, Jf =a\{i, v)) 
t 

= E J2P(Vu€(m,t+l],M (m)M m (u) < p, 

a£E m—1 

M (m)M m (t + 1) = w, W(t) = j\Vu G (0, m], 
M («) < p, Tf = m, = a)P(I7=m, JJ" = a|(i, «)) 
• P(Vu G (0, to], M (tt) < p\Tj"=m, Ji=a) 
t 



EE" 



b ia ( v + m) 



{(l+iA)'"<p} 



aGi? m—1 

■ P(Vw G (to, f + 1], (1 + iA) m M (u - to) < p, 

(1 + iA) m M (t + l-m)=w, W(t) = j\T^=m, J?=a) 
i 



aeE m—1 



bj ( t) + to) 



{(l+iA)'"<p} 



• P a j [0,t-m: 



w p 
(1 + iA) m ' (l+iA) r 



This proves the following renewal-type equation for the 
fpt when horizon time t belongs to the first day: 



Ri t j(v,n;w,p) = 

E ^P(VuG (0,n + l],M (u) <p, 

M (n + 1) =io, W(n)=j, W(ra - 1) = a, 
M (n) = w|(i,«)) = 

^ 2P(Mo(n)(l+W(n))=w,W(n)=i| 
weSP?^ aeE 

Vu G (0,n],M (n) < p, W(n - 1) = o,M (n) =W, (*,«))• 
P(Vu G (0, n], M (it) < p, VK(n - 1) = a, 
M (n) =w\(i,v)) = 

V V P((l + W(n)) = -, Win) = j\W(n - 1) = a)- 
^ — ' ^— ' w 

W&SP^_ 1 ae-E 

P(Vu G (0,n],M (u) < p,W(n- 1) = a,M (n) =w|(i,*>) 



= E E *«,j 1 {(i+iA)=f}Pi,a(v,n- l;^,p) 
wesp^_ 1 aeE 

(111.19) 

By similar computations it is possible to have the fpt dis- 
tribution for time t = nd. The relation is the following: 

Ri t j(v, nd; w, p) = 



aeEwESPr, 



E E R iA v i [ n ~ 1)^; w,p)Pa,j(0,n; — , = ). 

— ^-^ WW 



(111.20) 

Formula (III. 20 1 is obtained by conditioning on all pos- 



sible states of the return process W(t) and on all pos- 
sible values of the accumulation factor Mo(t) at time 
t = in — l)d (the closing of day n — 1). 
The last case, when jn — l)d < t < nd, can be obtained by 
using jointly formulae (III. 18) and (111.20). The resulting 
relation is the following: 



(l-Hi(t + v) 
Ri,j{v,t;w,p) - l{(i + i&)t <p }0ij I 1 _ 



EE" 



K b ia iv + TO) 



aeEm=l n ' 

( 1 
Ra, j 0, t - to; 



{(l+ l A)™<p}' 



(1 + iA)" 1 ' (l + iA) r 

(111.18) 

Now let us consider the case in which t = n. This means 
that we work until the opening of the second day. In 
this situation we should take care for the transition at 
time t = n which is due to the Markov chain X(i). To 
obtain a formula for the fpt until time n it is sufficient 
to consider all possible states for the return and for the 
accumulation factor at time n — 1 and to use equation 



Ri tj (v,t;w,p) = 
p 

E yZ R *A v > (n-l)d;w,p)R a j (0,t-in-l)d; — , — ). 

a£EweSP (n - 1)d 

(111.21) 

Formula (III. 21 ) is obtained by conditioning on the states 



of the return process and on the values of the accumu- 
lation factor process at time t = (n — l)d and then by 
using formula (III. 18). 

Formulae ( |III18| ), ( |lTlTl9] ) ; ( |III.20| and pll.21| | allow 
us to compute the probability Rij{v, t; w, p) for all times 
t. It should be noted that if p is not too much big, it is 
highly probable that the accumulation factor pro cess ex - 
ceeds p within the day. In this case probabilities (III. 19 1, 



(III. 20 1 and (III. 21) will be equal to zero. Consequently, 



the fpt distribution will have non zero values only for 



1 < t < n — 1. In this case (III. 18 1 satisfies the following 
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simpler equation: 

fl-Hi(t + v)\ 
Ri{v, t; p) - l {( i +iA )*<p} I 1 _ H .( V ) J + 

2. 2. l- Hi (v) ^(i-HA^JW-m, (1 + iA)m )- 

aeEm=l n ' *. ' / 

(111.22) 



By taking the expectation in (III. 26) and by using 



which is obtained from (III. 18) through relation (III. 16 1 



B. The intraday autocorrelation function 

In this subsection we derive the equation for the intra- 
day autocorrelation function. Let us denote by 

ji(x,v;t,s) = 

Cov(M x (x +t + l),M x (x + t + s+ l)\Z(x) = i,B(x) = v) 

(111.23) 

From now on we will work under the assumption that 
Kn <x<x + t<x + t + s< (K +l)n. This means that 
the autocorrelation function is analyzed for times within 
the same day; for this reason we will refer to it as the 
intraday autocorrelation function. 

Notice that, because the semi-Markov process Z(t) is 
time- homogeneous, the autocorrelation function ( |IIL23 ) 
can be equivalently expressed independently of x in the 
following simpler form: 

n(v;t,s) = 

Cov(M {t + l),M (t + s + l)\Z{0) =i,B(0) = v) = 

t t+s 

Cov {i!v) (H(l + W(k)), IJ(1 + W(k))). 



the independence between l{T*=9,Jf=a\z(o)=i,B(0)=v} and 
rife=e(l + Z{k)) given the information set {Z(6) — 
a,B(0) = 0}, we get 

rm(v;t) = (1 + iA) t+1 P(T* > t| (*,«))+ 
t 

£ ^(1 + zA) e P(Tf = 9, J{ = a\(i, v)) 

aeE 6=1 
t 

E[H(l + Z(k))\Z(9)=a,B(e)=0} 



k=e 



that is 



mi(v;t) = (1 + iA) 



t+1 l-Hi(t-l + v) 



k=0 



k=0 



To compute the autocorrelation function (III. 24) we 



(111.24) 



need the knowledge of the expected accumulation factor 
denoted by 

mi (v;t) = E[Mo(t+l)\(i,v)}. (111.25) 
Since 1 < t < n — lwe have that W(k) = Z(k) and then 



/?? 



(v;t) = E[l[(l + Z(k))\(i,v)]. 



k=0 



Let us consider the random variable J^[ fe _ (l + Z{k)); 
it is possible to give a recursive representation of this 
random variable. In fact 

t t 

JJ(1 + Z(fc)) - l { r f >t|z(o)=i,B(o)=«} II^ + zA ) + 



fc=0 



fc=0 



£ £ II ^ + Z ( k )) 1 {Tf=0,J*=a\Z(O)=i,B(O)=v} 
aeE 6=1 k=0 

(111.26) 

where the simbol A = B denotes that the two random 
variables A and _B have the same distribution. 



1 - Hi(v) 



££(!+*) t 



o6B 8=1 



flbi n(g + f) 



(111.27) 



m o (O;t-0). 



To evaluate the autocorrelation function we need also 
the knowledge of the second order cross moment of the 
accumulation factor 



,(2) 



t+s 



(v; t, s) = E {i>v) [f](l + Z(k)) J](l + Z{k))}. 



k=0 k=0 

(111.28) 

Also in this case we can give a recursive representation 
of the random variable ]lLo( 1 + Z ( k )) llltoC 1 + Z ( k ))- 
In fact it holds true that 



t+s 



Y[(i+z(k)f n (i+z(k))± 

k=0 k=t+l 

l { T f > f | (i ,,)}(l + *A) 2 ( t + 1 )(l + *A) t 



t+s 



£ £ 1 {T»=fl,J«=o|(t 1 t I )}- 

aeE e=t+i 



t+s 



k=e+i 



(H-iA) 2 <* +1 >(l + iA) fl -* J] (1 + Z(k)) + 
t 

"zZ"/Z 1 {T 1 '=e,J!=a\(i,v)}- 



aeE 6=1 



t+s 

n 

k=6+l k=t+l 



{i+iAf^ n (!+^( fc )) 2 n ( i+z ( fc )) 



(111.29) 

By taking the expectation in (III. 29) and by using 



the independence between l{rf =0,Jf =a\(i,v)} an d the ran- 
dom variables [nLte+it 1 + Z ( fc ))l and QlLe+il 1 + 
Z(k)) 2 nl= S t+i(l + ^(&))] gi ven the information set 
{Z(6) = a,B(9) = 0}, we get the following recursive 



G 



equation for the second order cross moment 



m\ '(v;t,s) = 



1-Hjjt + s + v) 
1-Hi(v) 



t+s 



E E ^^(l + <A) H * fa m.(0:'-«-«)+ 



a£E 8=t+l 
I 



Hi(v) 



E E rr^ 1 + *A) a ^mW(o 5 1 - e, S ). 



a£E 9=1 



(111.30) 

By solving equations (III. 27) and (III. 30) we can obtain 



the intraday autocorrelation volatility function through 
the following relation: 

(2) 

7i(u; t, s) = m] (v; t, s) — rrii{v; t)mi(v; t + s). (III. 31) 




IV. APPLICATION TO REAL HIGH 
FREQUENCY DATA 



FIG. 2. Number of transition for the embedded Markov chain 



A. Database description 

The data we used in this work are tick-by-tick 
quotes of indexes and stocks downloaded from 
www.borsaitaliana.it for the period January 2007- 
December 2010 (4 full years). The data have been 
re-sampled to have 1 minute frequency. Consider a 
single day (say day k with 1 < k < d) where d is 
number of traded days in the time series. In our case 
we consider four years of trading (from the first of 
January 2007 corresponding to d = 1076). The market 
in Italy fixes the opening price at a random time in 
the first minute after 9 am, continuous trading starts 
immediately after and ends just before 5.25 pm, finally 
the closing price is fixed just after 5.30 pm. Therefore, 
let us define S(t) as the price of the last trading before 
9.01.00 am , S(t + 1) as the price of the last trading 
before 9.02.00 am and so on until S(nk) as the price 
of the last trading before 5.25.00 pm. If there are no 
transactions in the minute, the price remains unchanged 
(even in the case the title is suspended and reopened in 
the same day). Also define S(nk + 1) as the opening 
price and S(nk) as the closing price. With this choice 
n = 507. There was a small difference before the 28th 
of September 2009 since continuous trading started 
at 9,05 am, and therefore prior of that date we have 
n = 502. Finally, if the title has a delay in opening or 
it closes in advance (suspended but not reopened), only 
the effective trading minutes are taken into account. In 
this case n will be smaller then 507. The number of 
returns analyzed is then roughly 508000 for each stock. 
We analyzed all the stocks in the FTSEMIB which are 
the 40 most capitalized stocks in the Italian stock market. 

To be able to model returns as a semi-Markov process 
the state space has to be discretized. In the example 
shown in this work we discretized returns into 5 states 



chosen to be symmetrical with respect to returns equal 
zero. Returns are in fact already discretized in real data 
due to the discretization of stock prices. We then tried to 
remain as much as possible close to this discretization. In 
Figure [2] we show an example of the number of transition 
from state i to all other states for the embedded Markov 
chain. 

From the discretized returns we estimated the proba- 
bilities P and Gij{t) to generate a synthetic trajectory 
of the semi-Markov process modeled as described in Sec- 



tion III For comparison reason, we also generated a syn- 
thetic trajectory which follows a simple Markov model 
with transition probability matrix estimated from the 
real data. We then ended up with three trajectory: one 
representing real data, the second one a semi-Markov tra- 
jectory and the last one a Markov chain. The three time 
series are used in the following to compare results on fpt 
distributions and on autocorrelations. 



B. Test 

The semi-Markov hypothesis is tested applying a test 
of hypothesis proposed by [8] and shortly described here 
below. As already stated, the model can be considered 
semi-Markovian if the sojourn times are not geometri- 
cally distributed. The probability distribution function 
of the sojourn time in state i before making a transi- 
tion in state j has been denoted by G.y(-). Define the 
corresponding probability mass function by 



9ij{t) — P{T n +\ — T n — t\J n — 
Gy(i)-Gtf(i-1) if t > 1 
G i:i (l) if t = 1 



^5 J n 



+ 1 



3} 



(IV.32) 



Under the geometrical hypothesis the equality gij(l)(l — 
<7ij(l)) — 3ij(2) = must hold, then a sufficiently strong 
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state 


state 


score decision 


i = 3 


3 


= 1 


9,638 Ho rejected 


i = 3 


3 


= 2 


13,752 Ho rejected 


i = 3 


3 


= 4 


13,527 Ho rejected 


i = 3 


3 


= 5 


10,199 Ho rejected 



TABLE I. Results of the Test 



deviation from this equality has to be interpreted as an 
evidence in favor of the semi-Markov model. The test- 
statistic is the following: 



Sij — 



VWJ)(9a(m - gij(i)) - ^(2)) 

v /50-(l)(l-ft J -(l)) 2 (2-5 -(l)) 



(IV.33) 
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0.005 
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"i *i 

i ~ji 

:T * 








A 



10" 10' 10 

X (min) 

p 

FIG. 3. First passage time distribution for p — 1.005 



where N(i,j) denotes the number of transitions from 
state i to state j observed in the sample and gij(x) is 
the empirical estimator of the probability 9ij{x) which 
is given by the ratio between the number of transition 
from i to j occurring exactly after x unit of time and 
N(i,j). This statistic, under the geometrical hypothesis 
H (or markovian hypothesis), has approximately the 
standard normal distribution, see [S]. 

We applied this procedure to our data to execute tests 
at a significance level of 95%. Because we have 5 states 
we estimated the 5 x (5 — 1) waiting time distribution 
functions and for each of them we computed the value 
of the test-statistic ( IV.33[ ). The geometric hypothesis 
is rejected for 15 of the 20 distributions. Due to lack of 
space, we do not report all the values of the test-statistic, 
but they are available upon request. In Table 1 we 
show the results of the test applied to the waiting time 
distribution functions with starting state i = 3. 

The large values of the test statistic suggest the 
rejection of the Markovian hypothesis in favor of the 
more general semi-Markov one. 



D. Results on autocorrelation function 

Another important feature of the stochastic process 
that describes financial time series is the autocorrelation 
of the square of returns. Indeed, while returns are un- 
correlated the absolute value or, which is the same, their 
square value is autocorrelated with a specific decaying 
structure. This is well observed in our data as shown, 
again just for one stock, in Figure |4j In the figure we 
also compare the results obtained directly from data and 
those obtained by the use of our semi-Markov model. 

The autocorrelation of the square of returns is defined 

as 



£(i,t + r) = Cov{W 2 (t + r), W 2 {t)) 



To compute (IV.34I observe that: 
E (hv) [W 2 (t- 



)W 2 (t)} 



t+v 



j£E h£E v'—0 



(IV.34) 



(IV.35) 



C. Results on first passage time distribution 

For each of the stocks in our database we estimate the 
first passage time distribution directly from the data (real 
data) and from the two synthetic time series generated 
as described above. 

It is not possible to show all the results here, we then 
show only one figure of the fpt distribution obtained for 
one stock (FIAT) and one value of p (1.005). 

From Figure [3] it is obvious that even if the semi- 
Markov model does not resemble exactly the fpt distri- 
bution of real data it works much better than the simpler 
Markov model. 



E ( ^ v) [W 2 (t + r)} = b & 3 (v;t + T)j 2 , (IV.36) 
E { i, v) [W 2 {t)] = h <t>ih{v;t)h 2 . (IV.37) 

h£E 

If we assume that the process is in the stationary 
regime then E(r) := lim^oo S(t,t + t) is independent 
of t and can be expressed by using the stationary distri- 
bution of the Markov chain (W{t),B{t)) studied in [T5] . 
The following formulas allows the computation of the au- 
tocovariance: 

E {z . v) [W 2 (t + r)W 2 (t)} 

= EEE^^W (IV - 38) 

j£E h£E v'>0 
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— Real data 

Semi-Markov 

2 Markov 




10 20 30 

time lag (min) 



FIG. 4. Autocorrelation function of W 2 (t) 



E {i!v) [W 2 (t + r)] = Y, *jf, (IV-39) 

jEE 



E^ v) [W 2 (t)} = J2^hh 2 , (IV AO) 

h£E 



where TT h (v') = — ^ , nj = J2v>>o 7r j( v ') and Mm is 
the mean recurrence time of state i for the semi-Markov 
process W(t). 

Again we show in the figure that the semi-Markov 
model, even if is still far from the results on real data, can 
give much better results than a simple Markov-model. 



V. CONCLUSIONS 

In this work we introduced a semi-Markov process to 
model high frequency stock returns. The model has been 
used to obtain both theoretical and empirical results on 
the first passage time distribution and on the autocor- 
relation function of the square of returns. We were able 
to calculate analytically the fpt distribution and the au- 
tocorrelation function and also to generate a synthetic 
time series starting from real data. We have shown, by 
means of Montecarlo simulations, that the semi-Markov 
model is able to reproduce much better than a simple 
Markov model results seen on real data. This suggest 
that the semi-Markov environment should be preferred 
when modeling stock market. 
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